AITopics | South San Francisco

Collaborating Authors

South San Francisco

Contextualizing biological perturbation experiments through language

Wu, Menghua, Littman, Russell, Levine, Jacob, Qiu, Lin, Biancalani, Tommaso, Richmond, David, Huetter, Jan-Christian

arXiv.org Artificial IntelligenceFeb-28-2025

High-content perturbation experiments allow scientists to probe biomolecular systems at unprecedented resolution, but experimental and analysis costs pose significant barriers to widespread adoption. Machine learning has the potential to guide efficient exploration of the perturbation space and extract novel insights from these data. However, current approaches neglect the semantic richness of the relevant biology, and their objectives are misaligned with downstream biological analyses. In this paper, we hypothesize that large language models (LLMs) present a natural medium for representing complex biological relationships and rationalizing experimental outcomes. We propose PerturbQA, a benchmark for structured reasoning over perturbation experiments. Unlike current benchmarks that primarily interrogate existing knowledge, PerturbQA is inspired by open problems in perturbation modeling: prediction of differential expression and change of direction for unseen perturbations, and gene set enrichment. We evaluate state-of-the-art machine learning and statistical approaches for modeling perturbations, as well as standard LLM reasoning strategies, and we find that current methods perform poorly on PerturbQA. As a proof of feasibility, we introduce Summer (SUMMarize, retrievE, and answeR, a simple, domain-informed LLM framework that matches or exceeds the current state-of-the-art. Our code and data are publicly available at https://github.com/genentech/PerturbQA.

conference paper, differential expression, perturbation, (15 more...)

arXiv.org Artificial Intelligence

2502.2129

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Republic of Türkiye > Corum Province > Corum (0.04)
(6 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction

Zadorozhny, Karina, Chuang, Kangway V., Sathappan, Bharath, Wallace, Ewan, Sresht, Vishnu, Grambow, Colin A.

arXiv.org Artificial IntelligenceJan-15-2025

Accurate prediction of molecular activities is crucial for efficient drug discovery, yet remains challenging due to limited and noisy datasets. We introduce Similarity-Quantized Relative Learning (SQRL), a learning framework that reformulates molecular activity prediction as relative difference learning between structurally similar pairs of compounds. SQRL uses precomputed molecular similarities to enhance training of graph neural networks and other architectures, and significantly improves accuracy and generalization in low-data regimes common in drug discovery. We demonstrate its broad applicability and real-world potential through benchmarking on public datasets as well as proprietary industry data. Our findings demonstrate that leveraging similarity-aware relative differences provides an effective paradigm for molecular activity prediction.

molecule, prediction, representation, (13 more...)

arXiv.org Artificial Intelligence

2501.09103

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > South San Francisco (0.04)
Europe > United Kingdom > England (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Modeling variable guide efficiency in pooled CRISPR screens with ContrastiveVI+

Weinberger, Ethan, Conrad, Ryan, Ashuach, Tal

arXiv.org Machine LearningNov-11-2024

Genetic screens mediated via CRISPR-Cas9 combined with high-content readouts have emerged as powerful tools for biological discovery. However, computational analyses of these screens come with additional challenges beyond those found with standard scRNA-seq analyses. For example, perturbation-induced variations of interest may be subtle and masked by other dominant source of variation shared with controls, and variable guide efficiency results in some cells not undergoing genetic perturbation despite expressing a guide RNA. While a number of methods have been developed to address the former problem by explicitly disentangling perturbation-induced variations from those shared with controls, less attention has been paid to the latter problem of noisy perturbation labels. To address this issue, here we propose ContrastiveVI+, a generative modeling framework that both disentangles perturbation-induced from non-perturbation-related variations while also inferring whether cells truly underwent genomic edits. Applied to three large-scale Perturb-seq datasets, we find that ContrastiveVI+ better recovers known perturbation-induced variations compared to previous methods while successfully identifying cells that escaped the functional consequences of guide RNA expression. An open-source implementation of our model is available at https://github.

perturbation, perturbation-induced variation, variation, (14 more...)

arXiv.org Machine Learning

2411.08072

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > South San Francisco (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

JAMUN: Transferable Molecular Conformational Ensemble Generation with Walk-Jump Sampling

Daigavane, Ameya, Vani, Bodhi P., Saremi, Saeed, Kleinhenz, Joseph, Rackers, Joshua

arXiv.org Artificial IntelligenceOct-18-2024

They are not well characterized as single structures as has traditionally been the case, but rather as ensembles of structures with an ergodic probability distribution(Henzler-Wildman & Kern, 2007). Protein motion is required for myglobin to bind oxygen and move it around the body (Miller & Phillips, 2021). Drug discovery on protein kinases depends on characterizing kinase conforma-tional ensembles (Gough & Kalodimos, 2024). The search for druggable'cryptic pockets' requires understanding protein dynamics, and antibody design is deeply affected by conformational ensembles (Colombo, 2023). However, while machine learning (ML) methods for molecular structure prediction have experienced enormous success recently, ML methods for dynamics have yet to have similar impact. ML models for generating molecular ensembles are widely considered the'next frontier' (Bowman, 2024; Miller & Phillips, 2021; Zheng et al., 2023).

dataset, ensemble, jamun, (9 more...)

arXiv.org Artificial Intelligence

2410.14621

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > South San Francisco (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Synthesizing Proton-Density Fat Fraction and $R_2^*$ from 2-point Dixon MRI with Generative Machine Learning

Anand, Suma, Xu, Kaiwen, O'Dushlaine, Colm, Mukherjee, Sumit

arXiv.org Artificial IntelligenceOct-14-2024

Magnetic Resonance Imaging (MRI) is the gold standard for measuring fat and iron content non-invasively in the body via measures known as Proton Density Fat Fraction (PDFF) and $R_2^*$, respectively. However, conventional PDFF and $R_2^*$ quantification methods operate on MR images voxel-wise and require at least three measurements to estimate three quantities: water, fat, and $R_2^*$. Alternatively, the two-point Dixon MRI protocol is widely used and fast because it acquires only two measurements; however, these cannot be used to estimate three quantities voxel-wise. Leveraging the fact that neighboring voxels have similar values, we propose using a generative machine learning approach to learn PDFF and $R_2^*$ from Dixon MRI. We use paired Dixon-IDEAL data from UK Biobank in the liver and a Pix2Pix conditional GAN to demonstrate the first large-scale $R_2^*$ imputation from two-point Dixon MRIs. Using our proposed approach, we synthesize PDFF and $R_2^*$ maps that show significantly greater correlation with ground-truth than conventional voxel-wise baselines.

artificial intelligence, machine learning, pdff, (13 more...)

arXiv.org Artificial Intelligence

2410.11186

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > California > San Mateo County > South San Francisco (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Chen, Benson, Danel, Tomasz, McEnaney, Patrick J., Jain, Nikhil, Novikov, Kirill, Akki, Spurti Umesh, Turnbull, Joshua L., Pandya, Virja Atul, Belotserkovskii, Boris P., Weaver, Jared Bryce, Biswas, Ankita, Nguyen, Dat, Dreiman, Gabriel H. S., Sultan, Mohammad, Stanley, Nathaniel, Whalen, Daniel M, Kanichar, Divya, Klein, Christoph, Fox, Emily, Watts, R. Edward

arXiv.org Artificial IntelligenceOct-11-2024

DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to utilize such data. To bridge this gap, we present KinDEL, one of the first large, publicly available DEL datasets on two kinases: Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1). Interest in this data modality is growing due to its ability to generate extensive supervised chemical data that densely samples around select molecular structures. Demonstrating one such application of the data, we benchmark different machine learning techniques to develop predictive models for hit identification; in particular, we highlight recent structure-based probabilistic approaches. Finally, we provide biophysical assay data, both on-and off-DNA, to validate our models on a smaller subset of molecules. Data and code for our benchmarks can be found at https://github.com/insitro/kindel. DNA-Encoded Libraries (DEL) have emerged as a powerful tool in drug discovery, enabling highly efficient screens of small molecule libraries against therapeutically relevant targets (Yuen & Franzini, 2017; Gironda-Martínez et al., 2021; Kunig et al., 2021; Peterson & Liu, 2023). These massive libraries are efficiently constructed through combinatorial synthesis of chemical building blocks, or synthons, with each resulting molecule being assigned a DNA barcode (see Figure 1). DELs are then used in selection experiments against proteins of interest, wherein multiple rounds of washing are conducted to remove any weak binders, and the DNA tags of surviving molecules are sequenced as a measure of binding affinity. Despite the highly efficient throughput of DELs, data generated through these experiments are intrinsically noisy with various sources of bias arising from the DEL synthesis and selection processes, necessitating modern machine learning methods to learn signal from the data. Unfortunately, there is still a lack of large, publicly available DEL datasets and benchmarking tasks to drive this important research area.

artificial intelligence, machine learning, molecule, (17 more...)

arXiv.org Artificial Intelligence

2410.08938

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > South San Francisco (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

Bunne, Charlotte, Roohani, Yusuf, Rosen, Yanay, Gupta, Ankit, Zhang, Xikun, Roed, Marcel, Alexandrov, Theo, AlQuraishi, Mohammed, Brennan, Patricia, Burkhardt, Daniel B., Califano, Andrea, Cool, Jonah, Dernburg, Abby F., Ewing, Kirsty, Fox, Emily B., Haury, Matthias, Herr, Amy E., Horvitz, Eric, Hsu, Patrick D., Jain, Viren, Johnson, Gregory R., Kalil, Thomas, Kelley, David R., Kelley, Shana O., Kreshuk, Anna, Mitchison, Tim, Otte, Stephani, Shendure, Jay, Sofroniew, Nicholas J., Theis, Fabian, Theodoris, Christina V., Upadhyayula, Srigokul, Valer, Marc, Wang, Bo, Xing, Eric, Yeung-Levy, Serena, Zitnik, Marinka, Karaletsos, Theofanis, Regev, Aviv, Lundberg, Emma, Leskovec, Jure, Quake, Stephen R.

arXiv.org Artificial IntelligenceSep-17-2024

The cell is arguably the smallest unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of AI-powered Virtual Cells, where robust representations of cells and cellular systems under different conditions are directly learned from growing biological data across measurements and scales. We discuss desired capabilities of AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using Virtual Instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions is within reach.

aivc, representation, virtual cell, (16 more...)

arXiv.org Artificial Intelligence

2409.11654

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Washington > King County > Seattle (0.14)
(23 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(4 more...)

Add feedback

Automated Clinical Data Extraction with Knowledge Conditioned LLMs

Li, Diya, Kadav, Asim, Gao, Aijing, Li, Rui, Bourgon, Richard

arXiv.org Artificial IntelligenceJun-25-2024

The extraction of lung lesion information from clinical and medical imaging reports is crucial for research on and clinical care of lung-related diseases. Large language models (LLMs) can be effective at interpreting unstructured text in reports, but they often hallucinate due to a lack of domain-specific knowledge, leading to reduced accuracy and posing challenges for use in clinical settings. To address this, we propose a novel framework that aligns generated internal knowledge with external knowledge through in-context learning (ICL). Our framework employs a retriever to identify relevant units of internal or external knowledge and a grader to evaluate the truthfulness and helpfulness of the retrieved internal-knowledge rules, to align and update the knowledge bases. Our knowledge-conditioned approach also improves the accuracy and reliability of LLM outputs by addressing the extraction task in two stages: (i) lung lesion finding detection and primary structured field parsing, followed by (ii) further parsing of lesion description text into additional structured fields. Experiments with expert-curated test datasets demonstrate that this ICL approach can increase the F1 score for key fields (lesion size, margin and solidity) by an average of 12.9% over existing ICL methods.

extraction, knowledge, lesion, (16 more...)

arXiv.org Artificial Intelligence

2406.18027

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Mateo County > South San Francisco (0.04)

Genre: Research Report > Experimental Study (0.84)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

InteraRec: Screenshot Based Recommendations Using Multimodal Large Language Models

Karra, Saketh Reddy, Tulabandhula, Theja

arXiv.org Artificial IntelligenceJun-15-2024

Weblogs, comprised of records detailing user activities on any website, offer valuable insights into user preferences, behavior, and interests. Numerous recommendation algorithms, employing strategies such as collaborative filtering, content-based filtering, and hybrid methods, leverage the data mined through these weblogs to provide personalized recommendations to users. Despite the abundance of information available in these weblogs, identifying and extracting pertinent information and key features from them necessitate extensive engineering endeavors. The intricate nature of the data also poses a challenge for interpretation, especially for non-experts. In this study, we introduce a sophisticated and interactive recommendation framework denoted as InteraRec, which diverges from conventional approaches that exclusively depend on weblogs for recommendation generation. InteraRec framework captures high-frequency screenshots of web pages as users navigate through a website. Leveraging state-of-the-art multimodal large language models (MLLMs), it extracts valuable insights into user preferences from these screenshots by generating a textual summary based on predefined keywords. Subsequently, an LLM-integrated optimization setup utilizes this summary to generate tailored recommendations. Through our experiments, we demonstrate the effectiveness of InteraRec in providing users with valuable and personalized offerings. Furthermore, we explore the integration of session-based recommendation systems into the InteraRec framework, aiming to enhance its overall performance. Finally, we curate a new dataset comprising of screenshots from product web pages on the Amazon website for the validation of the InteraRec framework. Detailed experiments demonstrate the efficacy of the InteraRec framework in delivering valuable and personalized recommendations tailored to individual user preferences.

interarec framework, recommendation, screenshot, (13 more...)

arXiv.org Artificial Intelligence

2403.00822

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Histopathology Based AI Model Predicts Anti-Angiogenic Therapy Response in Renal Cancer Clinical Trial

Jasti, Jay, Zhong, Hua, Panwar, Vandana, Jarmale, Vipul, Miyata, Jeffrey, Carrillo, Deyssy, Christie, Alana, Rakheja, Dinesh, Modrusan, Zora, Kadel, Edward Ernest III, Beig, Niha, Huseni, Mahrukh, Brugarolas, James, Kapur, Payal, Rajaram, Satwik

arXiv.org Artificial IntelligenceMay-28-2024

Background: Predictive biomarkers of treatment response are lacking for metastatic clearcell renal cell carcinoma (ccRCC), a tumor type that is treated with angiogenesis inhibitors, immune checkpoint inhibitors, mTOR inhibitors and a HIF2 inhibitor. The Angioscore, an RNA-based quantification of angiogenesis, is arguably the best candidate to predict anti-angiogenic (AA) response. However, the clinical adoption of transcriptomic assays faces several challenges including standardization, time delay, and high cost. Further, ccRCC tumors are highly heterogenous, and sampling multiple areas for sequencing is impractical. Approach: Here we present a novel deep learning (DL) approach to predict the Angioscore from ubiquitous histopathology slides. In order to overcome the lack of interpretability, one of the biggest limitations of typical DL models, our model produces a visual vascular network which is the basis of the model's prediction. To test its reliability, we applied this model to multiple cohorts including a clinical trial dataset. Results: Our model accurately predicts the RNA-based Angioscore on multiple independent cohorts (spearman correlations of 0.77 and 0.73). Further, the predictions help unravel meaningful biology such as association of angiogenesis with grade, stage, and driver mutation status. Finally, we find our model is able to predict response to AA therapy, in both a real-world cohort and the IMmotion150 clinical trial. The predictive power of our model vastly exceeds that of CD31, a marker of vasculature, and nearly rivals the performance (c-index 0.66 vs 0.67) of the ground truth RNA-based Angioscore at a fraction of the cost. Conclusion: By providing a robust yet interpretable prediction of the Angioscore from histopathology slides alone, our approach offers insights into angiogenesis biology and AA treatment response. Introduction: Patients with metastatic clear cell renal cell carcinoma (ccRCC) are treated with anti-angiogenic (AA) therapies (e.g., vascular endothelial growth factor tyrosine kinase inhibitors VEGF-TKIs), immune checkpoint inhibitors (ICI), mammalian target of rapamycin (mTOR) inhibitors and a hypoxia inducible factor (HIF)-2 inhibitor, either in combination or as monotherapy (1).

angioscore, cohort, dl angioscore, (15 more...)

arXiv.org Artificial Intelligence

2405.18327

Country:

North America > United States > Texas > Dallas County > Dallas (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > South San Francisco (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Kidney Cancer (0.51)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback